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Abstract. We study the possibility to reconstruct 
primary mass composition with the use of combi- 
nations of basic shower characteristics, measured 
in hybrid experiments, such as depth of shower 
maximum from fluorescence side and signal in water 
Cherenkov tanks or in plastic scintillators from 
the ground side. To optimize discrimination perfor- 
mance of shower observables combinations we apply 
Fisher's discriminant analysis and give statistical 
estimates of separation of the obtained distributions 
on Fisher variables for proton and iron primaries. At 
the final stage we apply Multiparametric Topological 
Analysis to these distributions to extract composition 
from prepared mixtures with known fractions of 
showers from different primary particles. It is shown, 
that due to high sensitivity of water tanks to muons, 
combination of signal in them with X max looks 
especially promising for mass composition analysis, 
provided the energy is determined from longitudinal 
shower profile. 
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Introduction 

The experimental information on UHECR mass com- 
position coming from different experiments and from 
different mass reconstruction techniques is quite con- 
tradicting (T], 0. One of the main difficulties is that 
at these energies mass composition and hadronic in- 
teractions properties are both unknown and are deeply 
entangled. The necessary condition for the solution of 
this problem is the reconciliation of the results on mass 
composition obtained from different types of ground 
and fluorescence data between themselves and with 
astrophysical predictions on the origin of the anisotropy, 
'ankle' and GZK cut-off. Hybrid experiments, like Pierre 
Auger Observatory [5| or Telescope Array (TA) [6|, are 
perfectly suitable for this purpose, since they provide the 
opportunity to use combinations of extensive air shower 
(EAS) parameters to achieve the best possible mass 
resolution. The key role in this analysis can be played 
by the muon shower content — the most problematic 
for hadronic models [3] and the best mass sensitive 
EAS parameter. The upgrade of Auger with AMIGA 
scintillator counters array J4) is aimed right at the 
muon content measurement, but it is easy to show (see 
Section H} that already the total signal in the Auger 
water tanks preserves the difference between primaries 



in number of muons and can be profitable for primary 
mass reconstruction, provided the energy is indepen- 
dently determined from the longitudinal shower profile. 
In the case of TA, which will be in grade to measure 
only charged particles density, ground data alone will 
be weekly sensitive to primary particle mass and idea 
of the use of EAS observables combinations becomes 
indispensable. Using Auger and TA as examples, in 
this paper we put forward a strategy allowing to recon- 
struct primary mass composition from combinations of 
the fluorescence and ground data keeping in mind the 
limitations on the affordable simulation statistics of the 
UHECR showers. 

I. General notes on the choice of mass 

DISCRIMINATION PARAMETERS 

In the following we consider cases of Auger and 
TA to estimate the expected performance of the pro- 
posed mass reconstruction technique. We assume that 
primary energy can be estimated from the longitudinal 
shower profile and hence is practically primary mass 
independent. Briefly speaking, to enhance primary mass 
resolution of traditional X max parameter we suggest to 
use it in linear combinations with other basic shower 
parameters, such as signal in water tanks or particle 
density for Auger and TA correspondingly. 

The data set used for the analysis was generated with 
CORSIKA 6.204 Q (QGSJET 01 iJ/Gheisha 0) and 
CORSIKA 6.735 (QGSJET II El/Fluka2008.3 ifTTl ) 
packages and contains 1000 showers for every primary 
(p, O, Fe) and interaction model at 10 EeV and 37° 
zenith angle. All longitudinal showers characteristics 
and charged particles density were taken directly from 
CORSIKA output files. The calculations of the expected 
signal in Auger water tanks was performed according 
to the procedure described in fl2l . |[T3l with the use 
of the same GEANT 4 lookup tables as in fl3l . In 
case of TA we use charged particles density at 1000 m 
from the axis (Diooo) as an example, for densities at 
another distances the consideration line would be the 
same. Finally let us note, that qualitatively results for 
both combinations of interaction models used in the 
study are very similar, so below we will mostly discuss 
only results for QGSJET 01/Gheisha, which provides 
some worse discrimination performance. To characterize 
the separation of distributi ons we w ill use the merit 
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Fig. 1. Depth of shower maximum versus total ground plane signal 
and particle density for proton (red squares) and iron (blue crosses) 
showers at 1000 meters from the axis for QGSJET 01 model. 



<t Pj Fe are distributions means and standard deviations 
correspondingly. 

In Fig. [T] we present scatter plots of total ground plane 
signal and charged particles density vs X max at 1000 m 
from the shower axis. Good separation of iron and proton 

showers in (S^goc ^ max ) P^ ot ^ s both c ' ue to discrim- 
ination power of X max and to noticeable difference 
~ 13% in the average total signals. In absolute units this 
difference is the same as the difference between muon 
signals, but the separation of the total signals (MF=1.4) 
is surely worse that of the muon ones (MF=2.5) due to 
the smearing effect of the electromagnetic component. 
Despite of change with the distance of the (electromag- 
netic/muon) signal ratio, a good separation of protons 
and iron nuclei is kept in a wide range from 700 to 
1500 meters, since high sensitivity of Cherenkov tanks 
to the muon component results in different shifts of p 
and Fe populations along signal axis in X max vs total 
signal scatter plots. On the other hand, as expected, 
the separation between primaries in (-Diooo > X max ) plot 
is mostly due to discrimination power of X max , since 
distributions on charged particles density for protons 
and iron nuclei largely overlap (MF=0.5). In this case 
one can think of searching for another discrimination 
parameters combinations, but as it will be shown below, 
(-Diooo, X max ) pair provides one of the best (among pos- 
sible within TA conditions) discrimination resolutions. 

II. Fisher's discriminant analysis 

The problem of primary particle mass discrimination 
with the use of combination of two or more shower 
characteristics falls in the class of standard tasks of 



statistical pattern classification analysis (see e.g. lfT4l . 
D2), ifTBTl ) and one of its methods - linear discriminant 
analysis - was recently applied to study the classi- 
fication capability of longitudinal profile distribution 
parameters [17|. Using the Toolkit for Multivariate Data 
Analysis (TMVA) lfl6l here we perform a similar study 
for combinations of different fluorescence and ground 
data. 

As it was already discussed, p and Fe populations are 
well separated in both examples in Fig. Q] and what's 
more, they can be separated with high accuracy even by 
a straight line. Hence, it is opportune to apply in this case 
just linear discriminant analysis and namely Fisher's 
method. In this approach one seeks the direction along 
which two classes will be separated the best, i.e. one 
looks for the direction in parameters hyperspace, after 
projection on which the ratio of the distance between 
distributions means to the sum of their squared varia- 
tions will be maximized. The evident advantage of this 
approach is possibility to use any number of parameters 
avoiding "dimensionality curse" (thus reducing neces- 
sary simulations statistics) and to apply easily any fur- 
ther classification tools to the resulting one-dimensional 
distributions. In addition to Fisher's discriminant the 
performance of rectangular cut optimization, projective 
likelihood estimator and function discriminant analysis 
with quadratic and cubic functions lfl6l were checked 
and it was found that none of them outperforms Fisher's 
approach. 

To find the direction, along which the primaries will 
be separated in the optimal way Fisher's algorithm 
requires minimum training: already 5-10 events of every 
primary type can be enough to achieve the same results 
as in the case of the use of several hundreds events. 
After application of Fisher's method one gets the new 
variable which is simply the linear combination of 
original variables that provides the optimal separation 
in one-dimensional case. To characterize discrimination 
capability of different parameters combinations after 
application of Fisher's technique in Table U we give 
for them merit factors MF, areas A, separations (S 2 ) 
and misclassification rates £. Taking protons as 'signal' 
and iron nuclei as 'background', one can consider A as 
area under signal efficiency versus background rejection 
curve [16] (called also receiver operating characteristics 
curve 11411 ). the closer this area to unity, the better the 
classification is. Separation is defined in ifTBI as 

(5-2) = I [ (ysiv) - ysiv)) 2 dy 
2 J ys{y) + VB{y) 

where ys(y) and are the probability density 

functions for signal and background, (S 2 ) — 1 again 
means the best separation and corresponds to distribu- 
tions without overlap. The misclassification rate, used in 
addition to these statistical variables, is calculated in a 
very simple way to estimate possible error in event-by- 
event classification approach. We fit overlapping sides 
of distributions on Fisher variables for protons and irons 
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with Gaussian functions and consider all events to the 
left of intersection point of these Gaussian fits as iron 
and all other events as protons. In this case some number 
£ p of proton events is recognized as irons and, vice versa, 
some number £p c of irons is classified as protons. 

From Table H] one can see, that discrimination for both 
high energy interaction models is very similar, though in 
case of QGSJET II the separation of p and Fe is slightly 
better, especially for combinations of total signal and 
X max with LDF slope parameter. Certainly, combination 
of X niax with muon signal at 1000 meters provides the 
best discrimination, but as one can see combinations 
of depth of shower maximum with the total signal in 
the tanks in the range 700-1500 meters also provide 
excellent separation of primaries with misclassification 
of only ^ 30 — 50 events out of 2000. Further addition 
to this couple of other shower characteristics does not 
improve significantly the discrimination capability and 
in the case of the real data can be completely useless 
due to presence of additional systematic errors, though 
the situation can change with energy and zenith an- 
gle, of course. Taking into account robustness of total 
signal at 1000 m to LDF reconstruction uncertainties 
the combination (X max , S\qq ) in our view looks as the 
optimal choice for primary mass composition analysis 
in Auger experimental conditions. Table Q] also shows, 
that despite of week discrimination power of charged 
particles density, its use together with X max allows 
to achieve separation of primaries with MF=1.44 (for 
QGSJET 01), while for X max distributions alone merit 
factor is equal to 1.16. At the considered energy and 
zenith angle (-Diooo^max) pair looks like the best 
choice for primary mass reconstruction with TA. 

Certainly, our conclusions on (SIqqq, X max ) and 
(Dioqq, X max ) as the best mass discrimination combi- 
nations are specific only for the energy and zenith angle 
discussed, in the sense that for another energies/angles 
addition of other parameters to these basic pairs may 
be helpful in optimization of their mass discrimination 
performance. 

III. Extraction of composition from test 
samples with multiparametric topological 
Analysis 

The basic idea behind the Multiparametric Topolog- 
ical Analysis (MTA) lfl8ll resides in the classification 
of showers from different primaries according to their 
topological distribution in multiparametric space. Con- 
sidering Fig. [T] one can divide the plane (X max , «S^qq ) 
in a number of cells and find probabilities for the 
showers falling in some particular cell to be initiated 
by proton or iron. Using only these probabilities on the 
pure set of proton showers one will erroneously arrive 
(in case of the overlap of p and Fe populations) to 
mixed composition. To correct such misclassification it 
is also necessary to compute mixing probabilities fl8l . 
determining the chance of event from one primary mass 
in the given cell to be misclassified as event of another 



TABLE I 

Discrimination performance of different shower 
parameters combinations after application of fisher's 
discriminant analysis. 



QGSJET 01 



Parameters 


Area 


(S 2 ) 


MF 


£p 


^Fe 


1UUU ' 


1.000 


0.995 


2.53 


10 


2 


[S7QO' ^max] 


0.996 


0.908 


1.90 


35 


15 


r ctot y 1 
L °1000' ^maxl 


0.996 


0.932 


2.02 


32 


14 


r Qtot V l 
L °1500> ^maxl 


0.997 


0.940 


2.18 


33 


14 


LDlOOO. -fmax] 


0.957 


0.677 


1.44 


139 


65 


[LDF P, X ma x] 


0.925 


0.578 


1.29 


184 


97 


[Sg 00 , LDF ft 


0.934 


0.627 


1.49 


172 


78 




0.997 


0.956 


2.08 


20 


7 


[^maxi ^1000 ' ^ max J 


0.999 


0.946 


2.16 


25 
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QGSJET II 


Parameters 


Area 


(S 2 ) 


MF 


& 


€fc 


[^1000' -^max] 


1.000 


0.985 


2.70 


11 


1 


1- 700' -^max] 


0.999 


0.961 


2.18 


24 


8 


r ctot -y l 
L°1000' ^maxl 


0.996 


0.942 


2.25 


28 


7 


r Qtot -y l 
L °1500> ^maxl 


0.994 


0.937 


2.32 


21 


11 


LDl000» -fmax] 


0.975 


0.770 


1.65 


99 


54 


[LDF P, X ma x] 


0.947 


0.674 


1.51 


135 


75 


[SJooo, LDF Pi 


0.952 


0.718 


1.64 


124 


73 




0.999 


0.966 


2.36 


17 


4 


[-^max> ^1000' -^max] 


0.997 


0.953 


2.33 


23 


5 



primary mass. Hence, to get both types of probabilities 
one has to use two independent sets of simulated events. 
To illustrate classification capability of MTA combined 
with discrimination power of Fisher's method, we have 
performed primary composition reconstruction of sam- 
ple mixtures with known fractions of protons, oxygen 
and iron nuclei. In Figs. |2][3] we present the results 
of MTA application to one-dimensional distributions on 
Fisher's variables F(X max , S\^ 00 ) and F(A" max , Dmoo)- 
The composition is very well reproduced when one 
uses (X max , 5*000) combination, with errors of 2-3% 
for [p, Fe] and and 3-5% for [p, O] mixtures. The 
discrimination power of (-Diooo> ^max) couple is surely 
worse (errors are 3-5% for [p, Fe] and and 8-10% 
for [p, O] mixtures) and in case of real experimental 
conditions with additional systematic errors its primary 
mass classification performance can be of limited use in 
the case when [p, O] mixture is considered. 

IV. Conclusions 

The present study allows to develop a new approach 
to the mass composition analysis of hybrid data. We 
propose to use combinations of longitudinal and lateral 
parameters to achieve maximum primaries separation in 
multiparametric space. Further application of Fisher's 
method optimizes discrimination, reducing the problem 
to one-dimensional case and allowing for lower simula- 
tion statistics. At the last stage one can apply different al- 
gorithms to extract mass composition from distributions 
on Fisher variables, which are more mass sensitive in 
comparison with e.g. traditionally used X max alone. We 
applied for this purpose MTA technique to the samples 
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Fig. 2. Reconstructed with MTA on the basis of Fisher's vari- 
ables F(X max ,S'j;gQ ) and F(X max ,Diooo) distributions proton (red 
squares) and iron (blue crosses) abundances in the samples with known 
primaries content. Lines mark the exact reconstruction results. 

with different primaries fractions and retrieved with very 
good accuracy nuclei abundances from [p, Fe] and [p, O] 
mixtures. 

Regarding the choice of mass sensitive parameters, it 
was shown, that for Auger the best mass discrimination 
can be achieved if to use (X max , Siooo) P a i r ' provided 
the primary energy is estimated from the longitudinal 
shower profile. The charged particles density measured 
in TA in combination with depth of shower maximum 
also provides good discrimination of proton and iron 
showers, though in the real experimental conditions 
its sensitivity seems to be limited for proton-oxygen 
mixture case. 
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